7 research outputs found

    Throughput analysis for a high-performance FPGA-accelerated real-time search application

    Get PDF
    We propose an FPGA design for the relevancy computation part of a high-throughput real-time search application. The application matches terms in a stream of documents against a static profile, held in off-chip memory. We present a mathematical analysis of the throughput of the application and apply it to the problem of scaling the Bloom filter used to discard nonmatches

    A low cost reconfigurable soft processor for multimedia applications: design synthesis and programming model

    Get PDF
    This paper presents an FPGA implementation of a low cost 8 bit reconfigurable processor core for media processing applications. The core is optimized to provide all basic arithmetic and logic functions required by the media processing and other domains, as well as to make it easily integrable into a 2D array. This paper presents an investigation of the feasibility of the core as a potential soft processing architecture for FPGA platforms. The core was synthesized on the entire Virtex FPGA family to evaluate its overall performance, scalability and portability. A special feature of the proposed architecture is its simple programming model which allows low level programming. Throughput results for popular benchmarks coded using the programming model and cycle accurate simulator are presented

    MORA - an architecture and programming model for a resource efficient coarse grained reconfigurable processor

    Get PDF
    This paper presents an architecture and implementation details for MORA, a novel coarse grained reconfigurable processor for accelerating media processing applications. The MORA architecture involves a 2-D array of several such processors, to deliver low cost, high throughput performance in media processing applications. A distinguishing feature of the MORA architecture is the co-design of hardware architecture and low-level programming language throughout the design cycle. The implementation details for the single MORA processor, and benchmark evaluation using a cycle accurate simulator are presented

    A C++-embedded Domain-Specific Language for programming the MORA soft processor array

    Get PDF
    MORA is a novel platform for high-level FPGA programming of streaming vector and matrix operations, aimed at multimedia applications. It consists of soft array of pipelined low-complexity SIMD processors-in-memory (PIM). We present a Domain-Specific Language (DSL) for high-level programming of the MORA soft processor array. The DSL is embedded in C++, providing designers with a familiar language framework and the ability to compile designs using a standard compiler for functional testing before generating the FPGA bitstream using the MORA toolchain. The paper discusses the MORA-C++ DSL and the compilation route into the assembly for the MORA machine and provides examples to illustrate the programming model and performance

    Radiation-hardened reconfigurable array with instruction roll-back

    No full text
    This letter presents the design and evaluation of a coarse grained reconfigurable array, hardened against radiation induced transient errors. The architecture consists of an 8 × 8 array of reconfigurable cells, each provided with a built-in soft error detection and instruction roll-back control. We also present the communication management scheme between the processors in the presence of varying degrees of single event upsets. The impact on throughput while evaluating an 8 × 8 discrete wavelet transform (DWT) and 8 × 8 discrete cosine transform (DCT) are also presented

    A few lines of code, thousands of cores: high-level FPGA programming using vector processor networks

    No full text
    MORA-C++ is a novel, high-efficiency FPGA dataflow programming framework. The framework, which consists of a compile-time configurable network of Vectorized Processors-in-Memory (PIM) cores, is programmed using a high-level C++ API. In this paper we present our work on support for vectorized cores and variable-size data path widths. Measurement results on our implementation for the SGI RC-100 platform show that, by instantiating over a thousand customized cores, the MORA-C++ framework can achieve throughputs very close to the I/O bandwidth of the system for a high-performance DCT application
    corecore